Skip to content

Conversation

@pabigot
Copy link
Contributor

@pabigot pabigot commented Feb 15, 2020

SUPERSEDED by #23333

Addressing #22494 in support of #21538 this PR:

  • extracts the async notification infrastructure from the on-off service into a self-contained structure that can be used anywhere asynchronous operations may be used;
  • recasts the on-off service as a manager that is to be used by services;
  • adds a manager for queued operations that handles all the queue and event processing generically, supported by service-specific helper functions for validation, processing, and completion notification.

See documentation at https://builds.zephyrproject.org/zephyrproject-rtos/zephyr/22853/docs/reference/index.html by digging into the latest results of the Documentation Github Action, finding the artifact, downloading it, uncompressing this, untarring that, and pointing your browser at the html.

@zephyrbot zephyrbot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test area: Documentation labels Feb 15, 2020
@zephyrbot
Copy link

zephyrbot commented Feb 15, 2020

All checks are passing now.

checkpatch (informational only, not a failure)

-:638: WARNING:LONG_LINE: line over 80 characters
#638: FILE: include/sys/async_notify.h:315:
+					      async_notify_generic_callback handler)

-:1051: WARNING:LONG_LINE: line over 80 characters
#1051: FILE: include/sys/queued_operation.h:44:
+#define QUEUED_OPERATION_PRIORITY_APPEND ((int)QUEUED_OPERATION_PRIORITY_MASK + 1)

-:1060: WARNING:LONG_LINE: line over 80 characters
#1060: FILE: include/sys/queued_operation.h:53:
+#define QUEUED_OPERATION_PRIORITY_PREPEND ((int)QUEUED_OPERATION_PRIORITY_MASK + 2)

-:1075: WARNING:LONG_LINE: line over 80 characters
#1075: FILE: include/sys/queued_operation.h:68:
+#define QUEUED_OPERATION_EXTENSION_POS (QUEUED_OPERATION_PRIORITY_POS + QUEUED_OPERATION_PRIORITY_BITS)

-:1252: WARNING:LONG_LINE: line over 80 characters
#1252: FILE: include/sys/queued_operation.h:245:
+static inline int queued_operation_fetch_result(const struct queued_operation *op,

-:1666: WARNING:LONG_LINE: line over 80 characters
#1666: FILE: lib/os/queued_operation.c:83:
+static inline void trivial_start_and_unlock(struct queued_operation_manager *mgr,

-:1682: WARNING:LONG_LINE: line over 80 characters
#1682: FILE: lib/os/queued_operation.c:99:
+	u32_t mask = (QUEUED_OPERATION_PRIORITY_MASK << QUEUED_OPERATION_PRIORITY_POS);

-:1702: WARNING:LONG_LINE: line over 80 characters
#1702: FILE: lib/os/queued_operation.c:119:
+	async_notify_generic_callback cb = async_notify_finalize(&op->notify, res);

-:1713: WARNING:LONG_LINE_COMMENT: line over 80 characters
#1713: FILE: lib/os/queued_operation.c:130:
+ * @param from either ST_STARTING or ST_STOPPING depending on transition direction

-:1747: WARNING:LINE_SPACING: Missing a blank line after declarations
#1747: FILE: lib/os/queued_operation.c:164:
+	sys_slist_t ops = mgr->operations;
+	sys_slist_init(&mgr->operations);

-:1838: WARNING:LONG_LINE_COMMENT: line over 80 characters
#1838: FILE: lib/os/queued_operation.c:255:
+			/* If an operation finalized during notification we need to

-:1839: WARNING:LONG_LINE_COMMENT: line over 80 characters
#1839: FILE: lib/os/queued_operation.c:256:
+			 * reselect because finalization couldn't do that, otherwise

-:2083: WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'callback', this function's name, in a string
#2083: FILE: tests/lib/async_notify/src/main.c:28:
+		      "failed callback fetch");

-:2903: WARNING:LONG_LINE: line over 80 characters
#2903: FILE: tests/lib/queued_operation/src/main.c:216:
+	.manager = QUEUED_OPERATION_MANAGER_INITIALIZER(&vtable, &service.onoff),

-:2913: WARNING:LONG_LINE: line over 80 characters
#2913: FILE: tests/lib/queued_operation/src/main.c:226:
+		.manager = QUEUED_OPERATION_MANAGER_INITIALIZER(&vtable, &service.onoff),

-:3054: WARNING:LONG_LINE: line over 80 characters
#3054: FILE: tests/lib/queued_operation/src/main.c:367:
+		rc = service_submit(&service, &operation[i], pri_order[i].priority);

-:3105: WARNING:LONG_LINE: line over 80 characters
#3105: FILE: tests/lib/queued_operation/src/main.c:418:
+		rc = service_submit(&service, &operation[i], pri_order[i].priority);

-:3150: WARNING:LONG_LINE_COMMENT: line over 80 characters
#3150: FILE: tests/lib/queued_operation/src/main.c:463:
+		{ 0, 0 },       /* first because it gets grabbed when submitted */

-:3151: WARNING:LONG_LINE_COMMENT: line over 80 characters
#3151: FILE: tests/lib/queued_operation/src/main.c:464:
+		{ 0, 2 },       /* delayed by submit of higher priority during callback */

-:3173: WARNING:LONG_LINE: line over 80 characters
#3173: FILE: tests/lib/queued_operation/src/main.c:486:
+			zassert_equal(async_notify_fetch_result(np[i], &res), -EAGAIN,

-:3282: WARNING:LONG_LINE: line over 80 characters
#3282: FILE: tests/lib/queued_operation/src/main.c:595:
+		      "unsupported callback check failed: %d != %d", rc, expect);

-:3297: WARNING:LONG_LINE: line over 80 characters
#3297: FILE: tests/lib/queued_operation/src/main.c:610:
+		      "unsupported priority check failed: %d != %d", rc, expect);

-:3339: WARNING:LONG_LINE: line over 80 characters
#3339: FILE: tests/lib/queued_operation/src/main.c:652:
+				      "submit failed: %d != %d", rc, service.validate_rv);

-:3513: WARNING:LONG_LINE: line over 80 characters
#3513: FILE: tests/lib/queued_operation/src/main.c:826:
+	struct onoff_service_transitions onoff_transitions = *service.onoff.transitions;

-:3578: WARNING:LONG_LINE: line over 80 characters
#3578: FILE: tests/lib/queued_operation/src/main.c:891:
+	struct onoff_service_transitions onoff_transitions = *service.onoff.transitions;

- total: 0 errors, 25 warnings, 3405 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

Copy link
Contributor

@nordic-krch nordic-krch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments. I will continue tomorrow.

if (rv < 0) {
return rv;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably in majority of cases there will single operation only, it would be good to optimized this. I've been thinking about two things:

  • using only current to indicate that module is busy, finalizing stage can be encoded by using invalid address (e.g. UINTPTR_MAX)
  • when it is used then process can be started immediately and locklessy (note that for that atomic.h must be extended: samples: fxos8700-hid: Fix using gpio_pin_get() #22680 :
if (atomic_cas((atomic_t *)&mgr->current, (atomic_t)NULL, (atomic_t)op)) {
	mgr->vtable->process(mgr, op);
	return rv;
}

I tried it (and adapt getting next op) and measured time which went from 5.5us to 3.9us on nrf52.
I was trying if it is feasible (nordic-krch@47371da)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've addressed all the concerns except this one. (Aside: I think you have the wrong link for an extension to atomic.h.)

I'm not taking this suggestion, as I think it adds significant complexity (magic non-pointer values stored in pointers; a new synchronization mechanism) with little real value (1.5 us per submitted operation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not taking this suggestion, as I think it adds significant complexity

Here i disagree because:

  • one variable to keep busy state which is simpler
  • selecting next to process is simplified
  • speed up (you say 1.5us, I say 30%)
  • a bit less rom (150 bytes less)

*
* This function must be invoked by services that support queued
* operations when the operation provided to them through the process
* function have been completed. It is not intended to be invoked by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's worth stressing that this should be called at the and of completion notification because it may result with process call with NULL operation before actual action is performed. For example if process function is synchronous then finalizing may look like:

void process(struct queued_operation_manager *mgr,
		struct queued_operation *op)
{
	if (op) {
		queued_operation_finalize(mgr, op, 0);
                // do some processing
	} else {
		// shut down resources
	}
}

In the case above process(mng, NULL) will be called in the context of queued_operation_finalize call, before actual action and that's probably unintended.

It is implicitly explained in the sentence "During the call..." but i wonder if it is enough.

@nordic-krch
Copy link
Contributor

I've been wondering about a way to use manager without priority, to avoid traversing whole list (in locked context) in each submit. E.g. use some reserved priority value which would mean just append to the list.

In the manager you don't provide functions for initializing async_notify maybe then onoff should deprecate those and also expect that user will call async_notiify directly.

I'm done with review. I like this api. What about splitting PR in 2 (async notify first). Manager is missing tests and will probably take longer to review.

@pabigot
Copy link
Contributor Author

pabigot commented Feb 18, 2020

You're right that process() needs to be documented as re-entrant (a term we haven't yet formally defined), and that if it's synchronous that could cause stack problems due to recursion. I need to think about how to document that. I don't really expect this to be used for synchronous operations, except in testing, because you don't need a queue for them.

I've been wondering about a way to use manager without priority, to avoid traversing whole list (in locked context) in each submit. E.g. use some reserved priority value which would mean just append to the list.

I think that can be done, yes.

In the manager you don't provide functions for initializing async_notify maybe then onoff should deprecate those and also expect that user will call async_notiify directly.

You may be right. I don't want to deprecate anything right now, because I do want to rename onoff_service to onoff_manager for terminology reasons, and that requires more thought.

I'm done with review. I like this api. What about splitting PR in 2 (async notify first). Manager is missing tests and will probably take longer to review.

I'm OK with splitting it, but don't understand the comment about missing tests. The manager has 100% line-coverage in its test.

@nordic-krch
Copy link
Contributor

but don't understand the comment about missing tests. The manager has 100% line-coverage in its test.

Were they in PR yesterday? I didn't see them yesterday, in fact they are included.

@pabigot
Copy link
Contributor Author

pabigot commented Feb 18, 2020

Were they in PR yesterday? I didn't see them yesterday, in fact they are included.

They've been in the PR since it was first submitted.

@nordic-krch
Copy link
Contributor

They've been in the PR since it was first submitted.

Never mind, i must have missed them somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to move this out of kernel/other and put it directly under doc/reference under a new category, for example doc/reference/services or whatever.

@jfischer-no jfischer-no self-requested a review February 18, 2020 17:58
@pabigot
Copy link
Contributor Author

pabigot commented Feb 19, 2020

Latest push has no change to implementation, just moves the documentation out of the kernel area. Also rebased on current master.

@pabigot pabigot marked this pull request as ready for review February 19, 2020 13:29
@pabigot
Copy link
Contributor Author

pabigot commented Feb 20, 2020

Rebased on top of #22419 to avoid future merge conflicts and simplify development targeting next release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think we are going way too far with name lengths in Zephyr.
I suggest we rename this to anotify, here and everywhere else where async_notify appears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @carlescufi, most of the names are too long:
async_notify_generic_callback
async_notify_fetch_resul
async_notify_init_spinwait
ONOFF_SERVICE_TRANSITIONS_INITIALIZER
ASYNC_NOTIFY_METHOD_COMPLETED
ASYNC_NOTIFY_EXTENSION_MASK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling this header is not namespacing stuff properly.
Shouldn't everything be prefixed with sys_?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is too long. I suggest qdop or something similar.

@pabigot
Copy link
Contributor Author

pabigot commented Feb 21, 2020

In the absence of clear and consistent guidance naming seems to largely depend on personal taste. I prefer names that unambiguously identify the capability, and (try to) use them consistently for the related data structures, functions, and constants. If that's not the consensus primary consideration for Zephyr, I'll change it to whatever it's supposed to be.

Whether that includes adding the prefix sys_, and the implications of using that prefix, are also things people can discuss. Many things accessed through <sys/foo.h> don't have that prefix, especially things that aren't closely tied to the kernel, which is the case with these APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function should not continue when these two pointers are not equal, i.e. an attempt is made to finalize an operation that is not currently processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional; operations that are cancelled are finalized to -ECANCELED but are not the current operation. However, handling of the finalize field is incorrect in that case, in a way I thought I'd fixed....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but queued_operation_finalize() is no longer called from queued_operation_cancel() context. It is only API call now. I think in that case struct queued_operation *op should be removed from arguments list.

@jfischer-no
Copy link
Contributor

In the absence of clear and consistent guidance naming seems to largely depend on personal taste. I prefer names that unambiguously identify the capability, and (try to) use them consistently for the related data structures, functions, and constants. If that's not the consensus primary consideration for Zephyr, I'll change it to whatever it's supposed to be.

There is one guidance, In general, follow the Linux kernel coding style...
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#naming
But few names here are too long, I think callback can be shorten to cb and as @carlescufi suggested async_notify to anotify. IMHO sys_ is not necessary.

@nordic-krch
Copy link
Contributor

regarding names, I also think that they are ok. I'm for the rule: use abbreviations which are common otherwise full word. Unless you want to establish new common abbreviation (like dt in zephyr/linux world). async is already an abbreviation. As for callback i agree that this would be cb or clbk but i would keep queued and maybe consider op instead of operation since that is rather common abbreviation.

@pabigot pabigot force-pushed the nordic/asyncnot branch 4 times, most recently from 89007c2 to 330183e Compare February 24, 2020 19:07
Copy link
Contributor

@mbolivar mbolivar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few drive by typos.

@nordic-krch
Copy link
Contributor

@pabigot i've been trying to go back to the review after couple of days and noticed complexity increase. I came to the conclusion that the reason for that is the loop in select_next_and_unlock() and loop is necessary to handle notification about no pending operations which contrary to other process calls is synchronous (no callback to call queued_operation_finalize()). It's goal is to notify about going into idle but imo it has limitations because it is synchronous. It transfers complexity to the module underneath because:

  • module must maintain state to ramp up on first request after idle
  • if going to idle is asynchronous (e.g. scheduling i2c transfer to go to low power mode) then module must support handling (queueing) request which comes during transition to idle so module underneath can no longer be simple executor of asynchronous operations.

Because of that i would like to propose following change in the api: add setup(mgr, callback) and teardown(mgr, callback). Those functions are optional and since they only drive internal manager state the callback can be something like callback(struct queued_operation_manager *mgr, int res).
With that change every operation in the manager is asynchronous thus managing will simplify significantly. Also because those are optional contrary to process(mgr, NULL) call.
Note that I started to work on asynchronous sequence manager where i used similar approach: https://github.com/nordic-krch/zephyr/blob/async_seq/include/sys/async_seq.h

@pabigot
Copy link
Contributor Author

pabigot commented Feb 25, 2020

It's not immediately obvious how adding two optional functions to the API simplifies anything, because there's a inherent complexity to notification/processing/finalization that was rediscovered as @anangl and I went through the previous comments.

However you raise an important technical point: process() is async when given a non-null operation parameter, but the manager state machine requires it to be synchronous when being told no more work is available because the manager immediately transitions to IDLE. In fact the startup and shutdown operations may be asynchronous which as you point out complicates the service implementation because it must delay with subsequent notifications while it's still in transition. Which is completely contrary to the goal.

But the limitations you describe are exactly the sort of thing the onoff manager is intended to support. So the best way to deal with this may be to add an optional onoff service pointer to the manager state. (This is also an "eat your own dogfood" thing, because the set of manager capabilities were always intended to be components that could be composed to support specific services.)

If the queued operation manager does not expose an onoff service, then the manager considers transitions from OFF to ON to be handled internal to the service. The existing manager state machine has to stay because it's what gets executed when the service is on, and process() is a straightforward async function that is invoked once for each operation to be processed, with no null pointer magic as an out-of-band indication (that's done by telling the service to turn off).

I think this has some promise, and will dig into it. Thanks for raising this concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it contain result? I know that it is written inside async_notify in operation but it will definitely be needed and fetching it inside callback seems strange as it is already available when this callback is invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary. The intermediary translation function provided by the service can extract the result if passing it to the end user as a distinct argument is useful. In a probably common case where the completed operation is simply appended to another list for processing in another context it isn't needed.

@pabigot
Copy link
Contributor Author

pabigot commented Mar 3, 2020

So the best way to deal with this may be to add an optional onoff service pointer to the manager state.

I've reworked the state machine to use an optional onoff manager to handle the transitions between OFF (no operations pending) and ON when the service needs to be informed about them. This involved some minor rework of the inner handling of success/error indicators.

The management is slightly simpler now for states specific to transitioning queued operations (IDLE, NOTIFYING, PROCESSING,FINALIZING).

Introduction of the onoff manager introduces the concept of an error in the queued operation manager. There will have to be new API to detect and react to that; it'll come soon.

I plan to close this PR soon and split out the pieces related only to async notify and onoff manager to another PR which is more likely to be merged quickly. A new PR in draft format based on that rework will be added to continue work on this feature. Any comments on the changes made here will be addressed in the initial post of that draft PR.

Full asynchronous support for APIs such as bus transactions generally
require managing operations from unrelated clients.  This API provides
a data structure and functions to manage those operations generically,
leaving the service to provide only the service-specific operation
description and implementation.

Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
/* Pointer to an on-off service supporting this service. NULL
* if service is always-on.
*/
struct onoff_service *onoff;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not convinced that onoff here is actually needed. Onoff is to manage on-off transistions when multiple users are present. It's not the case here. At least qop_manager is not aware of that. It enables service when going from idle and shuts down when queue is empty. Imo, better approach would be to have asynchronous setup,teardown functions which results in callback with result. It might be that in specific functions implementations onoff service will be used but it's beyond this manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat sympathetic to that position, but I'm going to stick with this approach for now.

A driving goal of the manager framework is to provide components that can be composed to solve complex problems without re-implementing things and trying to handle all the edge cases (e.g., invocation from within an interrupt handler). The onoff manager already provides all the necessary logic to safely and correctly handle an asynchronous transition between an off and on state, regardless of calling context and current underlying state. If it does something wrong, it needs to be fixed. Problems are more likely to be found and addressed in the onoff manager component, than in edge cases that don't get exercised by a specific queued operation manager callback solution.

Also, at least in the near term, the underlying services used by a queued operation manager (e.g. I2C or SPI) may continue to be used without a queued operation manager. They may also need to be able to be turned on and off as we start to inch towards functional power management. If that capability is provided through the onoff manager abstraction then it's available for both ways of using the service.

* function have been completed. It is not intended to be invoked by
* users of a service.
*
* During the call to this function the service process function will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's no longer true. Process function is not used to indicate no operations pending

}
}

int queued_operation_submit(struct queued_operation_manager *mgr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is getting quite long. Could you split it? e.g. extract putting into list into separate function?

u32_t state = mgr->state;

if (state == ST_ERROR) {
rv = -ENODEV;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interrupts are locked and function leaves

* This function can be called as a side effect of
* queued_operation_submit() or queued_operation_finalize() to
* tell the service that a new operation needs to be
* processed, or that there are no operations left to process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer true

{
bool loop = false;

do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't understand why you need a loop here. I only see a benefit that if process function is synchronous then process is not called recursively but this isn't a goal of this module. User is already warned in the header file that synchronous process may lead to recursive calls. We rather don't expect process to be synchronous because if it would be then there is no point of queuing operations.

Implementation is very complex now, with multiple places where interrupts are locked and unlocked in other functions. It's really hard to understand that. Especially that flow seems simple with only two forks: setup failure, new request pending after teardown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last time I incrementally refactored this based on a belief a loop wasn't necessary, it turned out that it was necessary. So I'm going to prioritize correctness. We can reconsider whether it's possible to avoid looping to react to events received during unlocked periods after the solution has been shown to work correctly and be functionally complete.

ST_STOPPING,

/* Service is in an error state. */
ST_ERROR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that there is no way to get out of error state. How to reinitialize the manager?

@pabigot
Copy link
Contributor Author

pabigot commented Mar 8, 2020

Superseded by #23333.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: API Changes to public APIs area: Documentation area: Tests Issues related to a particular existing or missing test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants